Gaussian Mixture Model for Handwritten Script Identification
نویسنده
چکیده
This paper presents a Gaussian Mixture Model (GMM) to identify the script of handwritten words of Roman, Devanagari, Kannada and Telugu scripts. It emphasizes the significance of directional energies for identification of script of the word. It is robust to varied image sizes and different styles of writing. A GMM is modeled using a set of six novel features derived from directional energy distributions of the underlying image. The standard deviation of directional energy distributions are computed by decomposing an image matrix into right and left diagonals. Furthermore, deviation of horizontal and vertical distributions of energies is also built-in to GMM. A dataset of 400 images out of 800 (200 of each script) are used for training GMM and the remaining is for testing. An exhaustive experimentation is carried out at bi-script, tri-script and multi-script level and achieved script identification accuracies in percentage as 98.7, 98.16 and 96.91 respectively.
منابع مشابه
Handwritten Script Identification: Fusion based Approaches
Script identification is one of the preprocessing steps in any document image processing task. Script identification in printed documents has achieved a greater attention whereas script identification in handwritten documents has achieved less attention from document research community. Almost all the existing works have made attempts on identifying suitable features or classifiers for handwrit...
متن کاملIdentification of Arabic/French Handwritten/Printed Words using GMM-Based System
The discrimination between languages is one of the first steps in the problem of automatic documents text recognition. In many documents, such as bank checks and application forms, printed and handwritten texts are mixed. In this paper, an automatic identification system of Arabic and French words in both handwritten and printed script based on Gaussian Mixture Models (GMMs) was presented. A fi...
متن کاملWord level script identification for scanned document images
In this paper, we compare the performance of three classifiers used to identify the script of words in scanned document images. In both training and testing, a Gabor filter is applied and 16 channels of features are extracted. Three classifiers (Support Vector Machines (SVM), Gaussian Mixture Model (GMM) and k -Nearest-Neighbor (k -NN)) are used to identify different scripts at the word level (...
متن کاملMixture of Experts for Persian handwritten word recognition
This paper presents the results of Persian handwritten word recognition based on Mixture of Experts technique. In the basic form of ME the problem space is automatically divided into several subspaces for the experts, and the outputs of experts are combined by a gating network. In our proposed model, we used Mixture of Experts Multi Layered Perceptrons with Momentum term, in the classification ...
متن کاملHandwritten Script Identification from a Bi-Script Document at Line Level using Gabor Filters
In a country like India where more number of scripts are in use, automatic identification of printed and handwritten script facilitates many important applications including sorting of document images and searching online archives of document images. In this paper, a Gabor feature based approach is presented to identify different Indian scripts from handwritten document images. Eight popular In...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1303.2751 شماره
صفحات -
تاریخ انتشار 2013